Introduction

  1. Creating a map
  2. Plotting a map
  3. Quality assuring a map

I’m assuming you have a basic understanding of antigenic cartography. If not, you can learn more here

Making a map

library(ggplot2)

library(Racmacs)
options(RacOptimizer.num_cores = parallel::detectCores())

Reading in data

First we need to read some data in. The way to read data in depends on the format.

data1 <- read.csv("data/small-dataset.csv", row.names=1)
# replace "/" in sera names
colnames(data1) <- gsub(".", "/", colnames(data1), fixed=T) 
map <- acmap(titer_table = data1)

Making a map

You don’t always get the same map as the starting coordinates are random - that’s why we run many optimisations. Before each map-making, I’m setting a random seed so the answer is reproducible.

ran_seed <- 5387

You can make the map directly from the table or from the acmap object we just created. These methods are effectively equivalent.

set.seed(ran_seed)
map_from_table <- make.acmap(titer_table = data1, number_of_dimensions = 2, number_of_optimizations = 100)
## a
## Warning: The following ANTIGENS have do not have enough titrations to position in 2 dimensions. Coordinates were still optimized but positions will be unreliable
## 
## 'EN/7/94'
## 'FI/339/95'
## Took 0.14 secs
## Warning in optimizeMap(map = map, number_of_dimensions = number_of_dimensions,
## : There is some variation (4.18 AU for one point) in the top runs, this may be
## an indication that more optimization runs could help achieve a better optimum.
## If this still fails to help see ?unstableMaps for further possible causes.
set.seed(ran_seed)
map_from_map <- optimizeMap(map = map, number_of_dimensions = 2, number_of_optimizations = 100)
## a
## Warning: The following ANTIGENS have do not have enough titrations to position in 2 dimensions. Coordinates were still optimized but positions will be unreliable
## 
## 'EN/7/94'
## 'FI/339/95'
## Took 0.1 secs
## Warning in optimizeMap(map = map, number_of_dimensions = 2,
## number_of_optimizations = 100): There is some variation (4.18 AU for one point)
## in the top runs, this may be an indication that more optimization runs could
## help achieve a better optimum. If this still fails to help see ?unstableMaps
## for further possible causes.

If you a sensible reason, you can fix the column base:

set.seed(ran_seed)
map_fixed_col_base <- optimizeMap(map = map, number_of_dimensions = 2, fixed_column_bases = rep(8, 21), number_of_optimizations = 100)
## a
## Warning: The following ANTIGENS have do not have enough titrations to position in 2 dimensions. Coordinates were still optimized but positions will be unreliable
## 
## 'EN/7/94'
## 'FI/339/95'
## Took 0.47 secs
## Warning in optimizeMap(map = map, number_of_dimensions = 2, fixed_column_bases
## = rep(8, : There is some variation (5.03 AU for one point) in the top runs,
## this may be an indication that more optimization runs could help achieve a
## better optimum. If this still fails to help see ?unstableMaps for further
## possible causes.

Or set a minimum column base:

set.seed(ran_seed)
map_min_col_base <- optimizeMap(map = map, number_of_dimensions = 2, minimum_column_basis = "1280", number_of_optimizations = 100)
## a
## Warning: The following ANTIGENS have do not have enough titrations to position in 2 dimensions. Coordinates were still optimized but positions will be unreliable
## 
## 'EN/7/94'
## 'FI/339/95'
## Took 0.15 secs
## Warning in optimizeMap(map = map, number_of_dimensions = 2,
## minimum_column_basis = "1280", : There is some variation (4.26 AU for one
## point) in the top runs, this may be an indication that more optimization runs
## could help achieve a better optimum. If this still fails to help see
## ?unstableMaps for further possible causes.

You can also make a map in 3D. We will check later if this is a good idea.

set.seed(ran_seed)
map_3d <- optimizeMap(map = map, number_of_dimensions = 3, number_of_optimizations = 100)
## a
## Warning: The following ANTIGENS are too underconstrained to position in 3 dimensions and coordinates have been set to NaN:
## 
## 'EN/7/94'
## 'FI/339/95'
## Warning: The following ANTIGENS have do not have enough titrations to position in 3 dimensions. Coordinates were still optimized but positions will be unreliable
## 
## 'MA/G252/93'
## 'NL/372/93'
## 'NL/399/93'
## 'FI/338/95'
## 'FI/381/95'
## 'HK/49/95'
## 'NL/271/95'
## 'HK/357/96'
## 'HK/358/96'
## 'LY/1781/96'
## 'JO/10/97'
## 'OS/21/97'
## 'OS/244/97'
## Took 0.22 secs
## Warning in optimizeMap(map = map, number_of_dimensions = 3,
## number_of_optimizations = 100): There is some variation (3.79 AU for one point)
## in the top runs, this may be an indication that more optimization runs could
## help achieve a better optimum. If this still fails to help see ?unstableMaps
## for further possible causes.

Saving your map

save.acmap(map_from_map, "out/map_from_map.ace")

Merging tables

Remember when merging tables that names & IDs need to be identical

First, open the file with the table to be merged.

data2 <- read.csv("data/small-dataset-2.csv", row.names=1)
# replace "/" in sera names
colnames(data2) <- gsub(".", "/", colnames(data2), fixed=T) 

map2 <- acmap(titer_table= data2)

The simplest method is to merge table together. You can also give the function a list of maps to merge.

merge_table <- mergeMaps(map, map2, method="table")
## a
## Warning: The 'conservative' titer merge method was used when merging titers, which differs to the 'likelihood' method employed in older Racmacs versions. You can suppress this warning by setting the merge method explicitly with "merge_options = list(method = 'conservative')")
## This warning is displayed once every 8 hours.
merge_map <- optimizeMap(merge_table, 2, 100)
## a
## Warning: The following ANTIGENS have do not have enough titrations to position in 2 dimensions. Coordinates were still optimized but positions will be unreliable
## 
## 'EN/7/94'
## 'FI/339/95'
## Took 0.11 secs
## Warning in optimizeMap(merge_table, 2, 100): There is some variation (4.57 AU
## for one point) in the top runs, this may be an indication that more
## optimization runs could help achieve a better optimum. If this still fails to
## help see ?unstableMaps for further possible causes.

If you already have a map, you can merge your new data in either by using the original map as a starting point, or freezing the original map and then merging new data in (this can be useful when you are merging mutants in and want to keep the background map consistent). I won’t demonstrate the frozen merge here as the two datasets completely overlap.

merge_incremental <- mergeMaps(map_from_map, map2, method="incremental-merge", number_of_dimensions = 2, number_of_optimizations = 100)
## a

There are some additional methods described here that may be applicable to your research.

Plotting a map

plot(map_from_map, plot_stress=T)

ggplot(map_from_map, plot_stress=T)

view(map_from_map)

Adding sequence data

Largely copied from here and there is also a youtube tutorial

Sequence data is added as a matrix, so if you start with a fasta file you will need to do a little pre-processing (aligning and trimming your sequences & then changing from a list to a matrix).

ag_sequences <- read.csv(
  file = system.file("extdata/h3map2004_sequences_ag.csv", package = "Racmacs"),
  colClasses = "character",
  row.names = 1
)

sr_sequences <- read.csv(
  file = system.file("extdata/h3map2004_sequences_sr.csv", package = "Racmacs"),
  colClasses = "character",
  row.names = 1
)

agSequences(map_from_map) <- ag_sequences[agNames(map_from_map),]
srSequences(map_from_map) <- sr_sequences[srNames(map_from_map),]

Colours, shapes and sizes

Color, shape, size, outline colour and outline width can be customised for antigens and sera. A few examples below. You can find more detail on styling maps here

# color by year
yr <- as.numeric(paste0(c(rep("19", 36), rep("20", 13)), sapply(strsplit(agNames(map_from_map), split="/", fixed=T), "[[", 3)))
agFill(map_from_map) <- rainbow(12)[yr-1992]

view(map_from_map)
agFill(map_from_map) <- "grey60"
# color by genetics
agFill(map_from_map)[agSequences(map_from_map)[,156]=="K"] <- "forestgreen"
agFill(map_from_map)[agSequences(map_from_map)[,156]=="Q"] <- "skyblue"
agFill(map_from_map)[agSequences(map_from_map)[,156]=="H"] <- "gold"

agSize(map_from_map)[c("PM/2007/99", "SY/5/97", "NA/933/95", "WU/359/95")] <- 10

agShape(map_from_map)[c(36, 29, 11, 13)] <- "EGG"
srShape(map_from_map)[c("PM/2007/99", "SY/5A/97", "SY/5B/97", "SY/5HAY/97", "SY/5V/97", "NA/933/95","WU/359B/95")] <- "UGLYEGG"

view(map_from_map)

Quality assurance

Stress

The stress is the measure of how well the data fits the map (lower stress is better). It can be calculated for the whole map or per antigen & per serum

mapStress(map_from_map)
## [1] 151.5363
hist(agStress(map_from_map))

hist(srStress(map_from_map))

hist(agStressPerTiter(map_from_map))

hist(srStressPerTiter(map_from_map))

ams <- allMapStresses(map_from_map)
head(ams)
## [1] 151.5363 153.0233 155.2882 155.9242 156.1003 156.1613
plot(ams)

If the top values of the stress are not similar, then the optimisation to make the map has not converged and either more optimisations are required or there are other issues with your data.

We can also compare stress between different merging methods

mapStress(merge_map)
## [1] 155.7719
mapStress(merge_incremental)
## [1] 154.5767

Viewer features (connection lines, error lines, stress dots)

view(map_from_map)

From the “Coloring” tab you can colour by stress; higher stress points are ones where the antigenic map fits the data less well.

From the “Diagnostics” tab, you can select an antigen and turn on

  • connection lines (button “C” or ctrl-c) which show which sera have been titrated against that antigen
  • error lines (button “E” or ctrl-e) which show the error in positioning. Red is too close and blue is too far. If both the antigen and sera are moved to the end of their line then there would be no difference between the map and table distances for that measurement.
  • titre values (no button, ctrl-t)

Procrustes

We use procrustes to compare two maps with antigens & sera in common.

First, we need to get another map to compare with (in that I need to remoce the antigen & serum IDs from the 2004 map so that the algorithm matches by name).

map_2004 <- read.acmap("data/seq-t9a-mod27.ace")
srIDs(map_2004) <- rep("", numSera(map_2004))
agIDs(map_2004) <- rep("", numAntigens(map_2004))
map_from_map <- realignMap(map_from_map, map_2004)
pc <- procrustesMap(map_from_map, map_2004)
view(pc)
pc_data <- procrustesData(map_from_map, map_2004)
names(pc_data)
## [1] "ag_dists"   "sr_dists"   "ag_rmsd"    "sr_rmsd"    "total_rmsd"
pc_data$total_rmsd
## [1] 1.264435

You can see another example of procrustes, including 3D procrustes, here

We can also perform a procrustes to the other optimisations and plot this.

pc_rmsd <- pc_rmedsd <- rep(NA, numOptimizations(map_from_map))
for (i in 1:numOptimizations(map_from_map)){
  pc_output <- procrustesData(map_from_map, map_from_map, 1, i)
  pc_rmsd[i] <- pc_output$total_rmsd
  pc_rmedsd[i] <- sqrt(median(c(pc_output$ag_dists, pc_output$sr_dists)^2))
}

plot(pc_rmsd)

plot(ams, pc_rmsd)

plot(pc_rmedsd)

plot(ams, pc_rmedsd)

plot(pc_rmsd, pc_rmedsd, asp=1)
abline(0,1)

Ideally there will be a low procrustes distance between the top optimisations. If maps with similar stress have high procrustes distances, then the map is metastable. The RMSD (root mean square deviation) is affected by outliers such as a small number of antigens or sera hemisphering so the RMedSD (root median square deviation) can be a more robust measure.

Point uncertainty (blob)

Point uncertainty can be visualised using bootsrapping where data is randomly excluded and the map re-made. Here I’ve used the default parameters of 1000 repeats and 100 optimisations per repeat. I’ve commented out the code as it might take a while to run on your machine - you can simply load the completed map for plotting.

There’s more detail on the bootstrapping methods here

# set.seed(ran_seed)
# bs_map <-bootstrapMap(map_from_map, method = "resample")
# save.acmap(bs_map, "out/bs_map.ace")
# blob95 <- bootstrapBlobs(bs_map, , conf.level=0.95)
# blob68 <- bootstrapBlobs(bs_map)
# blob10 <- bootstrapBlobs(bs_map, conf.level=0.1)
# save.acmap(blob95, "out/blob95_map.ace")
# save.acmap(blob68, "out/blob68_map.ace")
# save.acmap(blob10, "out/blob10_map.ace")

blob68 <- read.acmap("out/blob68_map.ace")
plot(blob68, plot_stress=T)
view(blob68)
blob10 <- read.acmap("out/blob10_map.ace")
plot(blob10, plot_stress=T)

view(blob10)

Dealing with warnings

Firstly, we can exclude the antigens that are uncoordinated and giving the first warning.

map_from_map_rm <- removeAntigens(map_from_map, c('EN/7/94', 'FI/339/95'))
set.seed(ran_seed)
map_from_map_rm <- optimizeMap(map = map_from_map_rm, number_of_dimensions = 2, number_of_optimizations = 100)
## Discarding previous optimization runs.
## a
## Took 0.23 secs
## Warning in optimizeMap(map = map_from_map_rm, number_of_dimensions = 2, : There
## is some variation (2.42 AU for one point) in the top runs, this may be an
## indication that more optimization runs could help achieve a better optimum. If
## this still fails to help see ?unstableMaps for further possible causes.
pc <- procrustesMap(map_from_map_rm, map_from_map_rm, 1, 2)
view(pc)

Then we can remove other antigens that are potentially poorly coordinated

hist(rowSums(data1!="*"), xlab="Number of measured data points per antigen", breaks=1:20)

map_from_map_rm2 <- removeAntigens(map_from_map, which(rowSums(data1!="*")<4))
set.seed(ran_seed)
# map_from_map_rm2 <- optimizeMap(map = map_from_map_rm2, number_of_dimensions = 2, number_of_optimizations = 100)

map_from_map_rm2 <- removeSera(map_from_map_rm2, 'MA/G252/93')
set.seed(ran_seed)
map_from_map_rm2 <- optimizeMap(map = map_from_map_rm2, number_of_dimensions = 2, number_of_optimizations = 100)
## Discarding previous optimization runs.
## a
## Took 0.08 secs
## Warning in optimizeMap(map = map_from_map_rm2, number_of_dimensions = 2, :
## There is some variation (2.86 AU for one point) in the top runs, this may be an
## indication that more optimization runs could help achieve a better optimum. If
## this still fails to help see ?unstableMaps for further possible causes.
pc <- procrustesMap(map_from_map_rm2, map_from_map_rm2, 1, 2)
view(pc)

We still have an unstable map. This is probably due to the limitations of the data and if this was a real experiment, we would consider making further measurments.

Dimension testing

You can check how well different numbers of dimensions represent the underlying data. The stress will usually decrease when increasing the number of dimensions.

mapStress(map_from_map)
## [1] 151.5363
mapStress(map_3d)
## [1] 130.9359

By removing data, we can test how well this removed data is predicted by maps with differing numbers of dimensions (cross-validation). The default setting test 1-5 dimesions, removing 10% of the data, 1000 optimisation and 100 replicates per dimension. Here I’ve saved the output of the code as it takes a while to run.

# dim_test <- dimensionTestMap(map_from_map)
# save(dim_test, file="out/dim_test.RData")

load("out/dim_test.RData")
dim_test
##   dimensions mean_rmse_detectable var_rmse_detectable mean_rmse_nondetectable
## 1          1             1.389009          0.04285565                1.221791
## 2          2             1.229333          0.11286930                1.167984
## 3          3             1.255879          0.15759603                1.290677
## 4          4             1.253939          0.19611440                1.320766
## 5          5             1.286569          0.33971046                1.328394
##   var_rmse_nondetectable replicates
## 1              0.1651072        100
## 2              0.1506297        100
## 3              0.1439530        100
## 4              0.1370128        100
## 5              0.1315355        100
#plotting for detectable titres
ci <- 1.96*sqrt(dim_test$var_rmse_detectable/100)
plot(dim_test$dimensions, dim_test$mean_rmse_detectable, ylim=range(c(dim_test$mean_rmse_detectable+ci, dim_test$mean_rmse_detectable-ci)))
arrows(dim_test$dimensions, dim_test$mean_rmse_detectable+ci, dim_test$dimensions, dim_test$mean_rmse_detectable-ci, length=0.1, angle=90, code=3)

The minimum is at 2 dimensions, suggesting this is best for this dataset. Sometimes you may want to use fewer dimensions for interpretability if there is marginal benefit with higher dimensions.

FAQs

How to get distance between all antigens?

dist_mat <- as.matrix(dist(agCoords(map_from_map)))

How to handle repeated sera within the same table?

If they are sufficiently similar, they can be merged.

How many sera should you have overlapping when merging tables?

Map cohesion calculates the vertex connectivity of the map (the minimum number of point that need to be removed to eliminate all paths from one point to another). If you have n overlapping sera & antigens between merged tables, then the vertex connectivity will be 2n (assuming that non of these titres are undetectable). A vertex connectivity of “number of map dimensions + 1” is required for map stability. Therefore, 3 overlapping antigens and sera (vertex connectivity = 6) should be sufficient if the map has 3 dimensions (minimum vertex connectivity = 4).

# original map
mapCohesion(map_from_map)
## [1] 2
# map with uncoordinated antigens removed
mapCohesion(map_from_map_rm)
## [1] 3
# map with potentialy poorly coordinated antigens removed
mapCohesion(map_from_map_rm2)
## [1] 5